JPPF: Multi-task Fusion for Consistent Panoptic-Part Segmentation (2311.18618v1)
Abstract: Part-aware panoptic segmentation is a problem of computer vision that aims to provide a semantic understanding of the scene at multiple levels of granularity. More precisely, semantic areas, object instances, and semantic parts are predicted simultaneously. In this paper, we present our Joint Panoptic Part Fusion (JPPF) that combines the three individual segmentations effectively to obtain a panoptic-part segmentation. Two aspects are of utmost importance for this: First, a unified model for the three problems is desired that allows for mutually improved and consistent representation learning. Second, balancing the combination so that it gives equal importance to all individual results during fusion. Our proposed JPPF is parameter-free and dynamically balances its input. The method is evaluated and compared on the Cityscapes Panoptic Parts (CPP) and Pascal Panoptic Parts (PPP) datasets in terms of PartPQ and Part-Whole Quality (PWQ). In extensive experiments, we verify the importance of our fair fusion, highlight its most significant impact for areas that can be further segmented into parts, and demonstrate the generalization capabilities of our design without fine-tuning on 5 additional datasets.
- Chen L, Papandreou G, Schroff F, et al. (2017) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:170605587 Chen et al. [2018a] Chen L, Collins MD, Zhu Y, et al. (2018a) Searching for efficient multi-scale architectures for dense image prediction. Advances in Neural Information Processing Systems (NeurIPS) Chen et al. [2018b] Chen LC, Zhu Y, Papandreou G, et al. (2018b) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European conference on computer vision (ECCV) Cheng et al. [2020] Cheng B, Collins MD, Zhu Y, et al. (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Cordts et al. [2016] Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Chen L, Collins MD, Zhu Y, et al. (2018a) Searching for efficient multi-scale architectures for dense image prediction. Advances in Neural Information Processing Systems (NeurIPS) Chen et al. [2018b] Chen LC, Zhu Y, Papandreou G, et al. (2018b) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European conference on computer vision (ECCV) Cheng et al. [2020] Cheng B, Collins MD, Zhu Y, et al. (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Cordts et al. [2016] Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Chen LC, Zhu Y, Papandreou G, et al. (2018b) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European conference on computer vision (ECCV) Cheng et al. [2020] Cheng B, Collins MD, Zhu Y, et al. (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Cordts et al. [2016] Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Cheng B, Collins MD, Zhu Y, et al. (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Cordts et al. [2016] Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Chen L, Collins MD, Zhu Y, et al. (2018a) Searching for efficient multi-scale architectures for dense image prediction. Advances in Neural Information Processing Systems (NeurIPS) Chen et al. [2018b] Chen LC, Zhu Y, Papandreou G, et al. (2018b) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European conference on computer vision (ECCV) Cheng et al. [2020] Cheng B, Collins MD, Zhu Y, et al. (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Cordts et al. [2016] Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Chen LC, Zhu Y, Papandreou G, et al. (2018b) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European conference on computer vision (ECCV) Cheng et al. [2020] Cheng B, Collins MD, Zhu Y, et al. (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Cordts et al. [2016] Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Cheng B, Collins MD, Zhu Y, et al. (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Cordts et al. [2016] Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Chen LC, Zhu Y, Papandreou G, et al. (2018b) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: European conference on computer vision (ECCV) Cheng et al. [2020] Cheng B, Collins MD, Zhu Y, et al. (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Cordts et al. [2016] Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Cheng B, Collins MD, Zhu Y, et al. (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Cordts et al. [2016] Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Cheng B, Collins MD, Zhu Y, et al. (2020) Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Cordts et al. [2016] Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Cordts M, Omran M, Ramos S, et al. (2016) The cityscapes dataset for semantic urban scene understanding. In: Conference on Computer Vision and Pattern Recognition (CVPR) Dong et al. [2013] Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Dong J, Chen Q, Xia W, et al. (2013) A deformable mixture parsing model with parselets. In: International Conference on Computer Vision (ICCV) Gao et al. [2019] Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Gao N, Shan Y, Wang Y, et al. (2019) Ssap: Single-shot instance segmentation with affinity pyramid. In: International Conference on Computer Vision (ICCV) Geiger et al. [2013] Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Geiger A, Lenz P, Stiller C, et al. (2013) Vision meets robotics: The kitti dataset. The International Journal of Robotics Research (IJRR) de Geus et al. [2021] de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- de Geus D, Meletis P, Lu C, et al. (2021) Part-aware panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Gong et al. [2018] Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Gong K, Liang X, Li Y, et al. (2018) Instance-level human parsing via part grouping network. In: European Conference on Computer Vision (ECCV) Gong et al. [2019] Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Gong K, Gao Y, Liang X, et al. (2019) Graphonomy: Universal human parsing via graph transfer learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Hariharan et al. [2014] Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Hariharan B, Arbeláez P, Girshick R, et al. (2014) Simultaneous detection and segmentation. In: European Conference on Computer Vision (ECCV) Hariharan et al. [2015] Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Hariharan B, Arbeláez P, Girshick R, et al. (2015) Hypercolumns for object segmentation and fine-grained localization. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2016] He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) He et al. [2017] He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn. In: International Conference on Computer Vision (ICCV) Jagadeesh et al. [2023] Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Jagadeesh SK, Schuster R, Stricker D (2023) Multi-task fusion for efficient panoptic-part segmentation. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM) Jiang and Chi [2018] Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Jiang Y, Chi Z (2018) A cnn model for semantic person part segmentation with capacity optimization. Transactions on Image Processing (T-IP) Jiang and Chi [2019] Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Jiang Y, Chi Z (2019) A cnn model for human parsing based on capacity optimization. Applied Sciences Kirillov et al. [2019a] Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Kirillov A, Girshick R, He K, et al. (2019a) Panoptic feature pyramid networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Kirillov et al. [2019b] Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Kirillov A, He K, Girshick R, et al. (2019b) Panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Ladicky et al. [2013] Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Ladicky L, Torr PH, Zisserman A (2013) Human pose estimation using a joint pixel-wise and part-wise formulation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2018a] Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Li J, Raventos A, Bhargava A, et al. (2018a) Learning to fuse things and stuff. arXiv preprint arXiv:181201192 Li et al. [2020a] Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Li P, Xu Y, Wei Y, et al. (2020a) Self-correction for human parsing. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Li et al. [2017a] Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Li Q, Arnab A, Torr PH (2017a) Holistic, instance-level human parsing. British Machine Vision Conference (BMVC) Li et al. [2018b] Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Li Q, Arnab A, Torr PH (2018b) Weakly-and semi-supervised panoptic segmentation. In: European conference on computer vision (ECCV) Li et al. [2020b] Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Li Q, Qi X, Torr PH (2020b) Unifying training and inference for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Li et al. [2022] Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Li X, Xu S, Yang Y, et al. (2022) Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation. In: European Conference on Computer Vision (ECCV) Li et al. [2023] Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Li X, Xu S, Yang Y, et al. (2023) Panopticpartformer++: A unified and decoupled view for panoptic part segmentation. arXiv preprint arXiv:230100954 Li et al. [2017b] Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Li Y, Qi H, Dai J, et al. (2017b) Fully convolutional instance-aware semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liang et al. [2018] Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Liang X, Gong K, Shen X, et al. (2018) Look into person: Joint body parsing & pose estimation network and a new benchmark. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Lin et al. [2020] Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Lin K, Wang L, Luo K, et al. (2020) Cross-domain complementary learning using pose for multi-person part segmentation. Transactions on Circuits and Systems for Video Technology (T-CSVT) Lin et al. [2014] Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Lin TY, Maire M, Belongie S, et al. (2014) Microsoft coco: Common objects in context. In: European Conference on Computer Vision (ECCV) Liu et al. [2019] Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Liu H, Peng C, Yu C, et al. (2019) An end-to-end network for panoptic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Liu et al. [2018] Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Liu S, Sun Y, Zhu D, et al. (2018) Cross-domain human parsing via adversarial feature and label adaptation. In: Conference On Artificial Intelligence (AAAI) Liu et al. [2021] Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Liu Z, Lin Y, Cao Y, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision (ICCV) Liu et al. [2022] Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Liu Z, Mao H, Wu CY, et al. (2022) A convnet for the 2020s. In: Conference on Computer Vision and Pattern Recognition (CVPR) Luo et al. [2013] Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: International Conference on Computer Vision (ICCV) Luo et al. [2018] Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Luo X, Su Z, Guo J, et al. (2018) Trusted guidance pyramid network for human parsing. In: ACM International Conference on Multimedia (ACM-MM) Meletis et al. [2020] Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Meletis P, Wen X, Lu C, et al. (2020) Cityscapes-panoptic-parts and pascal-panoptic-parts datasets for scene understanding. arXiv preprint arXiv:200407944 Michieli et al. [2020] Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Michieli U, Borsato E, Rossi L, et al. (2020) Gmnet: Graph matching network for large scale part semantic segmentation in the wild. In: European Conference on Computer Vision (ECCV) Mohan and Valada [2021] Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Mohan R, Valada A (2021) EfficientPS: Efficient Panoptic Segmentation. International Journal of Computer Vision (IJCV) Neuhold et al. [2017] Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Neuhold G, Ollmann T, Rota Bulo S, et al. (2017) The mapillary vistas dataset for semantic understanding of street scenes. In: International Conference on Computer Vision (ICCV) O Pinheiro et al. [2015] O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- O Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. Advances in Neural Information Processing Systems (NeurIPS) Pont-Tuset et al. [2016] Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Pont-Tuset J, Arbelaez P, Barron JT, et al. (2016) Multiscale combinatorial grouping for image segmentation and object proposal generation. Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) Porzi et al. [2019] Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Porzi L, Bulo SR, Colovic A, et al. (2019) Seamless scene segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Qiao et al. [2020] Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Qiao S, Chen LC, Yuille A (2020) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. arXiv preprint arXiv:200602334 Ren et al. [2015] Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Ren S, He K, Girshick RB, et al. (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NeurIPS) Ruan et al. [2019] Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Ruan T, Liu T, Huang Z, et al. (2019) Devil in the details: Towards accurate single and multiple human parsing. In: Conference on Artificial Intelligence (AAAI) Sakaridis et al. [2021] Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Sakaridis C, Dai D, Van Gool L (2021) Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In: International Conference on Computer Vision (ICCV) Sofiiuk et al. [2019] Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Sofiiuk K, Barinova O, Konushin A (2019) Adaptis: Adaptive instance selection network. In: International Conference on Computer Vision (ICCV) Tan and Le [2019] Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning (ICML) Tian et al. [2019] Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Tian Z, He T, Shen C, et al. (2019) Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: Conference on Computer Vision and Pattern Recognition (CVPR) Valada et al. [2018] Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Valada A, Mohan R, Burgard W (2018) Self-supervised model adaptation for multimodal semantic segmentation. International Journal of Computer Vision (IJCV) Varma et al. [2019] Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Varma G, Subramanian A, Namboodiri A, et al. (2019) Idd: A dataset for exploring problems of autonomous navigation in unconstrained environments. In: Winter Conference on Applications of Computer Vision (WACV) Xie et al. [2020] Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Xie Q, Luong MT, Hovy E, et al. (2020) Self-training with noisy student improves imagenet classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) Xiong et al. [2019] Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Xiong Y, Liao R, Zhao H, et al. (2019) Upsnet: A unified panoptic segmentation network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019a] Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Yang L, Song Q, Wang Z, et al. (2019a) Parsing r-cnn for instance-level human analysis. In: Conference on Computer Vision and Pattern Recognition (CVPR) Yang et al. [2019b] Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Yang T, Collins MD, Zhu Y, et al. (2019b) Deeperlab: Single-shot image parser. arXiv preprint arXiv:190205093 Yu et al. [2020] Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Yu F, Chen H, Wang X, et al. (2020) Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhang et al. [2022] Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Zhang H, Wu C, Zhang Z, et al. (2022) Resnest: Split-attention networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2017] Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. In: Conference on Computer Vision and Pattern Recognition (CVPR) Zhao et al. [2018] Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Zhao J, Li J, Cheng Y, et al. (2018) Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In: ACM International Conference on Multimedia (ACM-MM) Zhao et al. [2019] Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV) Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Zhao Y, Li J, Zhang Y, et al. (2019) Multi-class part parsing with joint boundary-semantic awareness. In: International Conference on Computer Vision (ICCV)
- Shishir Muralidhara (5 papers)
- Sravan Kumar Jagadeesh (2 papers)
- René Schuster (33 papers)
- Didier Stricker (144 papers)